Word order phenomena in conversational spoken French A study on task-oriented dialogue corpora and its consequences on language processing
نویسندگان
چکیده
This paper presents a corpus study that investigates the question of word order variations (WOV) in spontaneous spoken French and its consequences on the parsing techniques that are used in Natural Language Processing. We have studied four taskoriented spoken dialogue corpora which concern different application tasks (air transport or tourism information, switchboard calls). Two corpora concern phone conversations while the other two correspond to direct interaction. Every word order variation has been manually annotated by 3 experts, following a cross-validation procedure. Our results show that, while conversational spoken French should be highly affected by WOVs, it should also still be considered as a rigid order language: WOVs follow some impressive structural regularity and they result very rarely in discontinuous syntactic structures. As a result, non-projective parsers remain well adapted to conversational spoken French.
منابع مشابه
Efficient language model development for spoken dialogue recognition and its evaluation on operator's speech at call centers
While a language model for recognition of spoken dialogue is ideally built from a very large, specific-task-oriented corpus, a great amount of time and effort is required to develop such a corpus, and this involves both the audio recording and written transcription of large amounts of speech data. Training data for a language model should match the target task in both topic and style. What is n...
متن کاملData-Driven Dialogue Systems for Social Agents
In order to build dialogue systems to tackle the ambitious task of holding social conversations, we argue that we need a data-driven approach that includes insight into human conversational “chit-chat”, and which incorporates different natural language processing modules. Our strategy is to analyze and index large corpora of social media data, including Twitter conversations, online debates, di...
متن کاملAn Investigation of Spoken Output and Intervention Types among Iranian EFL Learners
This study was inspired by VanPatten and Uludag’s (2011) study on the transferability of training via processing instruction to output tasks and Mori’s (2002) work on the development of talk-in-interaction during a group task. An interview was devised as the pretest, posttest, and delayed posttest to compare four intervention types for teaching the simple past passive: traditional intervention ...
متن کاملDisMo: A Morphosyntactic, Disfluency and Multi-Word Unit Annotator. An Evaluation on a Corpus of French Spontaneous and Read Speech
We present DisMo, a multi-level annotator for spoken language corpora that integrates part-of-speech tagging with basic disfluency detection and annotation, and multi-word unit recognition. DisMo is a hybrid system that uses a combination of lexical resources, rules, and statistical models based on Conditional Random Fields (CRF). In this paper, we present the first public version of DisMo for ...
متن کاملUsability Evaluation of Multimodal and Domain-Oriented Spoken Language Dialogue Systems
Considerable work has been done regarding usability evaluation of task-oriented unimodal spoken language dialogue systems (SLDSs). However, there are still important gaps in our knowledge even in this area. If we move to multimodal task-oriented SLDSs, there are more challenges ahead primarily due to the combination of different modalities. For non-task-oriented conversational SLDSs, a major ch...
متن کامل